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The physics of compressive sensing (CS) and the gradient- based recovery algorithms are presented. First, 
the different forms for CS are summarized. Second, the physical meanings of coherence and measurement are 
given. Third, the gradient- based recovery algorithms and their geometry explanations are provided. Finally, we 
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1. Introduction 

The well-known Nyquist/ Shannon sampling theorem 
that the sampling rate must be at least twice the max- 
imum frequency of the signal is a golden rule used in 
visual and audio electronics, medical imaging devices, ra- 
dio receivers and so on. However, can we simply recover a 
signal from a small number of linear measurements? Yes, 
we can, answered firmly by Emmanuel J. Candcs, Justin 
Romberg, and Terence Tao [1] [2] [3]. They brought us 
the tool called Compressive Sensing (CS) [4] [5] [6] sev- 
eral years ago which avoids large digital data set and 
enables us to build the data compression directly from 
the acquisition. The mathematical theory underlying CS 
is deep and beautiful and draws from diverse fields, but 
we don't focus too much on the mathematical proofs. 
Here, we will give some physical explanations and dis- 
cuss relevant recovery algorithms. 

2. Exact Recovery of Sparse Signals 

Given a time-domain signal / G R^^^, there are four 
different forms for CS. (a) If / is sparse in the time- 
domain and the measurements are acquired in the time- 
domain also, then the optimization problem can be given 
by 



ll/lli s.t. Mof^v 



(1) 



where Mo G 



vMxN 



is the observation matrix and 



y G 



pM X 1 



are the measurements, (b) If / is sparse in 



the time-domain and the measurements are acquired in 
the transform-domain (Fourier transform, discrete cosine 
transform, wavelet transform, X-let transform, etc), then 
the optimization problem can be given by 

min||*Vlli s.t. Mof = y 



(2) 



where is the inverse transform matrix and satisfies 
\]/tv[/ — vj/vj/t = / (c) If / is sparse in the transform- 
domain and the measurements are acquired in the time- 
domain, then the optimization problem can be given by 



(d) If / is sparse in the transform-domain and the 
measurements are acquired in the transform-domain 
also, then the optimization problem can be given by 



min||/||i s.t. Mof^y- 



(4) 



From the above equations, the meanings of the spar- 
sity can be generalized. If the number of the non-zero ele- 
ments is very small compared with the length of the time- 
domain signal, the signal is sparse in the time-domain. 
If the most important K components in the transform- 
domain can represent signal accurately, we can say the 
signal is sparse in the transform-domain. Because we can 
set other unimportant components to be zero and imple- 
ment the inverse transform, the time-domain signal can 
be reconstructed with very small numerical error. The 
sparsity property also makes the lossy data compression 
possible. For the image processing, the derivatives of the 
image (especially for the geometric image) along the hor- 
izontal and vertical directions are sparse. For the physi- 
cal society, we can say the wave function is sparse with 
the specific basis representations. Before you go into the 
CS world, you must know what is sparse in what domain. 

The second question is what is the size limit for the 
measurements y in order to perfectly recover the K 
sparse signal. Usually, M > K\og2{N) or M w AK 
for the general signal or image. Further, if the signal 
/ is sparse in the transform-domain ^ and the measure- 
ments are acquired in the time-domain, then M > 
X {"i! , Ma) K \og2{N) , where x(*,-^o) is the coherence 
index between the basis system and the measurement 
system Mo [3]. The incoherence leads to the small x and 
therefore fewer measurements are required. The coher- 
ence index x can be easily found, if we rewrite (3) as 



min||/||i s.t. Xo*^/ = y- 
Similarly, (2) can be rewritten as 

min||/||i s.t. Mo'il^y 



(5) 



(6) 



minl|*/||i s.t. Mof^y. 



(3) 



Third, what are the inherent properties for the ob- 
servation matrix? The observation matrix obeys what is 
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known as a uniform uncertainty principle (UUP). 



M ^ WMofWl 



N 



(7) 



where C'l < 1 < C2. An alternative condition, which is 
called restricted isometry property (RIP), can be given 
by 



1-4 < 



2 
2 



<l + Sk 



(8) 



where Sk is a constant and is not too close to 1. The 
properties show the three facts: (a) The measurements y 
can maintain the energy of the original time-domain sig- 
nal /. In other words, the measurement process is stable, 
(b) If / is sparse, then Mq must be dense. This is the 
reason why the theorem is called UUP. (c) If we want 
to perfectly recover / from the measurements y, at least 
2K measurements are required. According to the UUP 
and RIP theorems, it is convenient to set the observation 
matrix Mo to a random matrix (normal distribution, 
uniform distribution, or Bernoulli distribution). 

Four, why li norm is used in (1)(2)(3)(4)? For the 
real application, the size of measurement M ^ A^. As a 
result, one will face the problem how to solve an under- 
determined matrix equation. In other words, there are 
a huge amount of different candidate signals that could 
all result in the given measurements. Thus, one must 
introduce some additional constraints to select the best 
candidate. The classical solution to such problems would 
be minimizing the I2 norm (the pseudo-inverse solution), 
which minimizes the amount of energy in the system. 
However, this leads to poor results for most practical ap- 
plications, as the recovered signal seldom has zero com- 
ponents. A more attractive solution would be minimiz- 
ing the ^0 norm, or equivalently maximize the number 
of zero components in the basis system. However, this 
is NP-hard (it contains the subset-sum problem), and 
so is computationally infcasiblc for all but the tiniest 
data sets. Thus, the h norm, or the sum of the absolute 
values, is usually what is to be minimized. Finding the 
candidate with the smallest li norm can be expressed rel- 
atively easily as a linear convex optimization program, 
for which efficient solution methods already exist. This 
leads to comparable results as using the Iq norm, often 
yielding results with many components being zero. For 
simplicity, we take the 2-D case for example. The bound- 
aries of Iq, h, and I2 norms are cross, diamond, and circle, 
respectively. (Sec Fig. 1). The underdetcrmincd matrix 
equation can be seen as a straight line. If the intersection 
between the straight line and the boundary is located at 
the X-axis or y-axis, the recovered result will be sparse. 
Obviously, the intersection will always be located at the 
axes if p < 1 for the Ip norm. 

Five, can CS have a good performance in a noisy en- 
vironment? Yes, it can. Because the recovery algorithm 
can get the most important K components and force 
other components (including noise components) to be 
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Fig. 1. The geometries of Ip norm: (a) p = 2; (b) p = 1; 
(c) p = 0.5; (d) p^O. 



zero. For the image processing, the recovery algorithm 
will not smooth the image but yield the sharp edge. 

Finally, let us review how CS encode and decode 
the time-domain signal /. For the encoder, it gets the 
measurements y or y according to the observation ma- 
trix M.Q. For the decoder, it recovers / or / by solving 
the convex optimization problem (1)(2)(3)(4) with y (y), 
A^Oi and Hence, CS can be seen as a fast encoder 
with lower sampling rate (fewer data set). The sampling 
rate only depends on the sparsity of / in some domains 
and goes beyond the limit of the Nyquist/Shannon sam- 
pling theorem. However, CS will face a challenging prob- 
lem: How to perfectly recover the signal with low com- 
putational complexity and memory? 

3. Physics of Compressive Sensing 

The most important concepts of the CS theory involve 
Coherence and Measurement. 

In physics, coherence is a property of waves, that en- 
ables stationary (i.e. temporally and spatially constant) 
interference. More generally, coherence describes all cor- 
relation properties between physical quantities of a wave. 
When interfering, two waves can add together to create a 
larger wave (constructive interference) or subtract from 
each other to create a smaller wave (destructive inter- 
ference), depending on their relative phase. The coher- 
ence of two waves follows from how well correlated the 
waves arc as quantified by the cross-correlation function. 
The cross-correlation quantifies the ability to predict the 
value of the second wave by knowing the value of the 
first. As an example, consider two waves perfectly corre- 
lated for all times. At any time, if the first wave changes, 
the second will change in the same way. If combined 
they can exhibit complete constructive interference at all 



^For (1)(4), ^ is not necessary. 
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times. It follows that they are perfectly coherent. So, the 
second wave needs not be a separate entity. It could be 
the first wave at a different time or position. In this case, 
sometimes called self-coherence, the measure of correla- 
tion is the autocorrelation function. Take the Thomas 
Young's double-slit experiment for example, a coherent 
light source illuminates a thin plate with two parallel 
slits cut in it, and the light passing through the slits 
strikes a screen behind them. The wave nature of light 
causes the light waves passing through both slits to inter- 
fere, creating an interference pattern of bright and dark 
bands on the screen. In fact, the dark bands can relate 
to the zero components in the signal processing field. It 
is well known that we select the basis functions coher- 
ent with the signal or image. If the signal is the square 
wave, the haar wavelet is a good choice. If the signal 
is the sine wave, the Fourier transform is a good choice. 
The coherence index between a signal and a basis system 
will decide the sparsity of the signal in the transform- 
domain. In other words, fewer basis functions will be 
used or more components in the transform-domain are 
to be zero if the signal and the basis functions are co- 
herent. For the CS, however, the observation matrix Mq 
and the time-domain signal / should be incoherent. In 
addition, the observation matrix Mq and the basis sys- 
tem VE* also should be incoherent. If it is not the case, the 
reconstruction matrix A^o^^ in (5) will be sparse, which 
violates the UUP theorem. 

Then, let us talk about quantum mechanics and the 
measurement. In physics, a wave function or wavefunc- 
tion is a mathematical tool used in quantum mechanics 
to describe any physical system. The values of the wave 
function are probability amplitudes (complex numbers). 
The squares of the absolute values of the wave functions 
l/P give the probability distribution (the chance of find- 
ing the subject at a certain time and position) that the 
system will be in any of the possible quantum states. 
The modern usage of the term wave function refers to 
a complex vector or function, i.e. an element in a com- 
plex Hilbert space. An element of a vector space can be 
expressed in different bases; and so the same applies to 
wave functions. The components of a wave function de- 
scribing the same physical state take different complex 
values depending on the basis being used; however the 
wave function itself is not dependent on the basis chosen. 
Similarly, in the signal processing field, we use different 
basis functions to represent the signal or image. 

The quantum state of a system is a mathematical ob- 
ject that fully describes the quantum system. Once the 
quantum state has been prepared, some aspect of it is 
measured (for example, its position or energy). It is a 
postulate of quantum mechanics that all measurements 
have an associated operator^ (called an observable op- 
erator). The expected result of the measurement is in 
general described not by a single number, but by a prob- 
ability distribution that specifies the likelihoods that the 



^For the discrete system, an operator can be seen as a matrix. 



various possible results will be obtained. The measure- 
ment process is often said to be random and indctermin- 
istic. Suppose we take a measurement corresponding to 
observable operator O, on a state whose quantum state 
is /. The mean value (expectation value) of the measure- 
ment is (/, Of) and the variance of the measurement is 
{f,0'^f) — {{f,Of))'^. Each descriptor (mean, variance, 
etc) of the measurement involves a part of information 
of the quantum state. In the signal processing field, the 
observation matrix A^o is a random matrix and each 
measurement iji captures a portion of information of the 
signal /. Due to the UUP and RIP theorems, all the 
measurements make the same contribution to recovering 
/. In other words, each measurement yi is equally impor- 
tant or unimportant'^. The unique property will make CS 
very powerful in the communication field (channel cod- 
ing). 

4. Gradient-Based Recovery Algorithms 

A fast, low-consumed, and reliable recovery algorithm 
is the core of the CS theory. There are a lot of outstand- 
ing work on the topic [7] [8] [9] [10]. Based on their work, 
we developed the gradient-based recovery algorithms. In 
particular, we did not reshape the image (matrix) to the 
signal (vector), which will consume a large amount of 
memory. We treat each column of the image as a vector 
and the comparable results also can be obtained. For the 
sparse image in the time-domain, the norm constraint 
is used. For the general image (especially for the geomet- 
ric image), the total variation constraint is used. Consid- 
ering the non-differentiability of the function \fj,k \ at the 
origin point, the subgradient or smooth approximation 
strategies [10] are employed. 

A. Gradient Algorithms and Their Geometries 

Before solving the constrained convex optimization 
problems, the clear and deep understandings for the 
gradient-based algorithms are necessary. Given a linear 
matrix equation 

Mof = V (9) 

the solution / can be found by solving the following min- 
imization problem 

mmL(/) = i||Xo/-?/||^ (10) 

The gradient-based algorithms for solving (10) can be 
written as 

= /' - Ai'VL(r) (11) 
where /x' is the iteration step size and 

VLiD^MliM^r^y) (12) 

•'For the traditional compression method, the perfect recon- 
struction is impossible if some important components are lost. 
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A variety of settings for result in different algo- 
rithms involving the gradient method, the steepest de- 
scent method, and the Newton's method. The gradient 
method sets ^' to a small constant. The steepest descent 
method sets /i* to 



(vL(r),vL(r)) 



VL{f),MlMoVL{p 



(13) 



which minimizes the residual = A^o/* ~ y in each 
iteration. Here, the small e is used for avoiding a zero 
denominator. For the Newton's method, /i* is taken as a 
constant matrix 



el 



(14) 



Here, the small e is used for avoiding a nearly singular 
matrix. 

To understand the geometries for the three gradient- 
based algorithms, a simple case is taken for example. A 
2-D function f{x,y) ^ {x + yf + {x + if + [y + if 
has a local minimum /* = (i, — |). The contour of the 
function and the trajectory of /* are drawn in Fig. 2. 



Newton 
Steepest Descent Mettiod 
Gradient 




2 4 6 8 10 



Fig. 2. The geometries for the gradient-based algorithms. 

The convergence of the gradient method is worst. The 
steepest descent method converges fast at the first sev- 
eral steps but slowly as the iteration step increases. The 
Newton's method is best and needs only one step for the 
2-D case"*. 

Next, we will apply the steepest descent method and 
the Newton's method to recover the signal or image / by 
using (1). The treatments for other convex optimization 
problems (2) (3) (4) are similar. 

B. li Norm Strategy 

We assume fj^k is the pixel of an iV x image / 
at the j-th row and the k-th column (1 < j < A^ and 



''For the CS, the performance of the Newton's method will de- 
crease due to the nearly singular observation matrix and the li 
norm constraint. 



1 < fc < A^). The convex optimization problem (1) for 
the sparse image can be converted to 



mmi/(/)^L(/) + All/Ill 



(15) 



where ||/||i = '^Zjk The above equation is a La- 

grange multiplier formulation. The first term relates to 
the underdetermined matrix equation (9) and the second 
/i-penalty term will assure a regularized sparse solution. 
The parameter A balances the weight of the first term 
and the second term. 

Because \ fj,k\ is not differentiable at the origin point, 
we can define a new subgradient for each fj,k as follows 



V,,fci/(/) 



V,,fci(/) + Asign(/,,fc), >£ 

V,, fci(/) + A, < e, V,, fcL(/) < - A 

V,,fcL(/)-A, |/,,fe|<e, Vj,feL(/)>A 



0, l/,,fc|<£, |V,-feL(/)|<A 



(16) 



Then the gradient-based algorithm can be written as 



(17) 



where j and k are, respectively, the row index and the 
column index of the image / and i denotes the i-th iter- 
ation step. The n\. has been given in (13) and (14). Bear 
in mind, the image will be treated column by column 
when computing and 'S/j^kL{P). 

For the steepest descent method, the parameter A can 
be taken as a small positive constant (A = 0.001 — 0.01). 
But for the Newton's method, the parameter A must 
be gradually decreased as the iteration step increases 
(A'+i = (0.99 - 0.999) x A*). 

C. Total Variation Strategy 

For a general image, especially for a geometric image, 
it is not sparse in the time-domain. Hence, the li norm 
strategy developed in the previous subsection will break 
down. 

The convex optimization problem for the general im- 
age can be given by 



mmi7(/) = L{f) + ATV(f) 



(18) 



where TV(f) is the total variation of the image /. The 
derivatives of / along the vertical and horizontal direc- 
tions can be defined as 



fj.k - fj+ 




i.fc 



1 < j < A 
j=N 



fj.k - f].k+i 1 <k <N 
fc = A^ 



(19) 



(20) 



The total variation of the image / is the summation for 
the magnitude of the gradient of each pixel [11] 



TV(f)=^W(Dv,f) =ElVj.kf| (21) 

j,k j,k 
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After some simple derivations, the gradient of the total 
variation with each pixel is given by 



V,- , (TV(f)) = 



(22) 



When treating (22), the smooth approximation strategy 
is used for avoiding a zero denominator, i.e. 



|V,,fe/| = ^(DJ,/) +(Dljy + e (23) 

The gradient-based algorithm has been given in (17). 

5. Numerical Experiments and Results 

A. Cases of li Norm Strategy 

The first image we'd like to recover is a 64 x 64 sparse 
diamond as shown in Fig. 3. 




Fig. 3. The 64 x 64 sparse diamond. 

Notice that the image itself is sparse in the time-domain, 
we need not to transform the image into other domains, 
such as the wavelet domain or Fourier domain. The size 
of the observation matrix for the nearly perfect recon- 
struction should be at least larger than 12 x 64 where 
12 is calculated by 2 • log2 64 = 12. If our observation 
matrices are generated by the uniform distribution from 
to 1, after 20000 iteration steps. Fig. 4 can be ob- 
tained. The subplot (a) shows the recovered image with 
a 10 X 64 observation matrix, while (b), (c), and (d) are 
recovered with 12 x 64, 15 x 64, and 20 x 64 random 
observation matrices respectively. When the size of the 
observation matrix is small, the poor reconstructed im- 
ages are shown in (a) and (b). But when the observation 
matrix gets larger and larger, the better results can be 
obtained as shown in (c) and (d). 

Another 64 x 64 image we wish to recover is a cir- 
cle as shown in Fig. 5. Although the circle is sparse 
on the whole, but it is not the case for some columns. 
These columns may require more measurements if we 
treat an n x n image as n vectors. Using the steepest 
descent method, we can recover the image after several 
iterations. Here, we don't want to compare the Newton's 
method with the steepest descent method to find which 





Fig. 4. The recovered diamond figures by the Newton's 
method from the observation matrices with different 
sizes: (a) 10 x 64; (b) 12 x 64; (c) 15 x 64; (d) 20 x 64. 




Fig. 5. The 64 x 64 sparse circle. 

one is more powerful and effective. (Actually, their per- 
formances are almost the same for the image.) What we 
want to show is the sparsity of an image affects the re- 
covered results a lot. Fig. 6 shows the recovered results 
from the observation matrices with different sizes slightly 
larger than the previous case. For the subplots (a) and 
(b), the results are undesirable. The subplot (c) is bet- 
ter and almost perfect reconstruction is obtained for the 
subplot (d). Again, we notice that for the small observa- 
tion matrix, the recovered results may vary drastically. 
If the size of the observation matrix is large enough, the 
reconstructed image is accurate or even exact in any re- 
peated experiments. The subplots (a), (b), and (c) in 
Fig. 6 show that those columns which are less sparse are 
the hardest to recover when the small observation matrix 
is utilized. For the case, the total variation strategy may 
be a better choice. In a word, a large observation matrix 
can capture more information of the image and therefore 
the image can be recovered with higher probability. 

B. Cases of Total Variation Strategy 

If the image is not sparse in the time-domain, it also 
can be recovered from the measurements without repre- 
sented as the basis functions '5 whose coefficients may 
be sparse. For instance, the geometric figure composed 
of a solid circle and a solid square is shown in Fig. 7. 
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Fig. 9. The recovered geometric image by the steepest 
descent method: 20 x 64 observation matrix is employed. 



Fig. 6. The recovered circle figmes by the steepest de- 
scent method from the observation matrices with differ- 
ent sizes: (a) f 5 x 64; (b) 20 x 64; (c) 25 x 64; (d) 30 x 64. 

It is easy to imagine the derivatives of the image are 
sparse. We apply the total variation strategy to recover 
the image. 




Fig. 7. The geometric figure. 

The size of the object image again is 64 x 64 and the 
20 X 64 observation matrix is employed. Fig. 8 is the 
typical measurements y and Fig. 9 is the reconstructed 
image from y. The peak signal to noise ratio (PSNR) 
calculated is always above 90 for the repeated exper- 
iments (different observation matrices with the same 
size), which suggests that the observation matrix is large 
enough to recover the original image with an extremely 
accurate result. 




Fig. 8. The measurement of the geometric figure. 

The next two images Cameraman and Boats in Fig. 10 
are quite well-known in the image processing field. We 
still treat the 256 x 256 image as 256 vectors. Although 
the result may not be so good as that we obtain by treat- 
ing the 256 X 256 image as a long vector of size 65536 x I, 
we do save a great amount of memory and calculation 
time. The size of our observation matrix Mq is 100 x 256 
rather than 25600x65536 (25600 « 65536/2.56). The re- 



covery algorithm is implemented in the wavelet-domain, 
and the PSNR for the subplots (a) and (b) in Fig. 11 
are 29.4 and 30.9, respectively. In addition, the gradient- 
based total variation algorithm also has a good perfor- 
mance for the geometric image Peppers. (Showing too 
many results arc not necessary). 



(a) (b) 




Fig. 10. The general image: (a) Cameraman; (b) Boats. 



(a) (b) 




Fig. 11. The recovered images by the Newton's method: 
100 X 256 observation matrix is employed, (a) Camera- 
man; (b) Boats. 

6. Discussions and Future Work 

The above are just some simple experiments for 
demonstrating that the CS was able to recover an im- 
age accurately from a few of random projections. One 
should understand that the main advantage of CS is not 
how small size it can compress the image to. In fact, if 
a signal is K sparse in some domains, we indeed require 
3A' to bK measurements to recover the signal. An ob- 
vious advantage of CS is that it can encode the signal 
or image fast. In particular, the prior knowledge about 
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the signal is not important. For example, it is not nec- 
essary for us to know the exact positions and values of 
the most important components beforehand. What we 
care is whether the image is sparse in some domains or 
not. A fixed observation matrix can be applied to mea- 
sure different signals, which makes the applications of 
CS for encoding and decoding possible. Meanwhile, the 
measurements play the same role in recovering the sig- 
nal or image, which makes CS very powerful in military 
applications (radar imaging) where we cannot afford the 
risk caused by the loss of the most important K com- 
ponents. Since each random projection (measurement) 
is equally (un)important, the CS is not sensitive to the 
noises or measurement errors and can provide the robust 
and stable performances. 

Although many researchers have made great pro- 
gresses in the convex optimization problems and demon- 
strated the accurate results on the scale we interest (hun- 
dreds of thousands of measurements and millions of pix- 
els), the more efficient algorithms are still required. Actu- 
ally, solving the li minimization problem is about 30-50 
times as expensive as solving the least squares problem. 
However, the unbalanced computational burden gives us 
a chance that the measurements are acquired by the sen- 
sors with lower power, and then the signal or image will 
be recovered on the central supercomputer. The algo- 
rithms, such as the conjugate gradient method and the 
generalized minimal residual method, will become our 
next candidates for accelerating the recovery algorithm. 

The physical understandings and applications for CS 
are under way, although a single-pixel camera has 
shocked the field of optics. We are aware that the CS 
has penetrated many fields and become a hotspot. We 
expect more mathematicians, physicist, and engineers 
make contributions for the CS field. 

7. Acknowledgement 

The authors arc not from signal or image processing 
society. Hence, some technical words may not be right. 
We hope the report can give some help for the researchers 
form the signal processing, image processing, and phys- 
ical societies. 

References 

1. Emmanuel J. Candes, Justin K. Romberg, and Terence 
Tao, "Robust uncertainty principles: Exact signal re- 
construction from highly incomplete frequency informa- 
tion," IEEE Transactions On Information Theory, vol. 
52, pp. 489-509, Feb. 2006. 

2. David L. Donoho, "Compressed sensing," IEEE Trans- 
actions On Information Theory, vol. 52, pp. 1289-1306, 
Apr. 2006. 

3. Emmanuel J. Candes and Michael B. Wakin, "People 
hearing without listening:" An introduction to compres- 
sive sampling. Report. 

4. Richard G. Baraniuk, "Compressive sensing," IEEE Sig- 
nal Processing Magazine, pp. 118-124, Jul. 2007. 



5. Emmanuel J. Candes and Michael B. Wakin, "An intro- 
duction to compressive sampling," IEEE Signal Process- 
ing Magazine, pp. 21-230, Mar. 2008. 

6. Justin K. Romberg, "Imaging via compressive sam- 
pling," IEEE Signal Processing Magazine, pp. 14-20, 
Mar. 2008. 

7. Emmanuel J. Candes and Justin K. Romberg, "Sig- 
nal recovery from random projections," Proceedings of 
SPIE-IS&T Electronic Imaging, vol. 5674, pp. 76-86, 
Mar. 2005. 

8. Emmanuel J. Candes and Terence Tao, "Decoding by 
linear programming," IEEE Transactions On Informa- 
tion Theory, vol. 51, pp. 4203-4215, Dec. 2005. 

9. Joel A. Tropp and Anna C. Gilbert, "Signal recovery 
from partial information via orthogonal matching pur- 
suit," IEEE Transactions On Information Theory, vol. 
53, pp. 4655-4666, Dec. 2007. 

10. Mark Schmidt, Glenn Fung, and Romer Rosales, Fast 
optimization methods for LI regularization: A compar- 
ative study and two new approaches, Machine Learning: 
ECML 2007, vol. 4701, Sep. 2007. 

11. Leonid I. Rudin, Stanley Osher, and Emad Fatemi, 
"Nonlinear total variation based noise removal algo- 
rithms," Physica D, vol. 60, pp. 259-268, Nov. 1992. 



7 



